Abstract
Bacteremia is a clinically significant condition due to its potential to rapidly progress to sepsis, a life-threatening complication if not promptly managed. Early identification and timely initiation of appropriate antibiotic therapy are crucial for improving outcomes. However, because blood culture results typically require more than 48 hours, clinicians often face challenges in making timely decisions, resulting in the overuse or misuse of antibiotics and associated adverse effects. To address this gap, we developed and externally validated a multi-institutional machine-learning score that, at the moment a blood culture is ordered, delivers both the probability of bloodstream infection and the predicted Gram class of the causative pathogen.
We extracted 58 structured EHR covariates—age, sex, 11 chronic‐disease flags, and 46 routinely obtained laboratory and vital-sign variables—from all adult inpatients who underwent blood-culture testing at Asan Medical Center (AMC) between March 2019 and December 2024 (development cohort, n = 86,802). For model derivation, we retained encounters with ≤ 5 % missing features, yielding 18,870 culture events (1,305 bacteremia-positive, 17,565 negative); these were randomly partitioned 4:1 into training and internal-test subsets. A stacked ensemble comprising gradient-boosted decision trees and a transformer encoder was optimized to maximize F₁ score for bacteremia detection. Gram-class prediction was trained on the 1,216 mono-microbial bacteremia cases (835 Gram-negative, 381 Gram-positive; ≤ 3 % missingness) using the same 4:1 split. External validation employed all blood-culture encounters from Severance Hospital during March 2022–December 2024 (n = 39,861), with no feature exclusion except unit harmonization. Models were locked before external inference; performance was reported as ROC-AUC, sensitivity, specificity, and calibration metrics for both the AMC hold-out set and the full Severance cohort.
In the internal hold-out set from AMC, the bacteremia model achieved an ROC-AUC of 0.854. When applied, without retraining, to the full external test (Severance) cohort (≤ 25 % feature-missingness, n = 23,189), discrimination remained clinically robust (ROC-AUC = 0.800). Tightening the inclusion criterion to ≤5 % missingness—thereby restricting the Severance test set to 6,016 culture events—raised performance to ROC-AUC of 0.902, confirming that data completeness was a principal driver of cross-site accuracy. Exploratory stratification revealed marked differences in baseline comorbidities, which may account for inter-site differences in predictive performance.
For culture-positive encounters, the Gram-class classifier distinguished Gram-negative from Gram-positive bacteremia with an ROC-AUC of 0.802 on the AMC internal test set. At the operating point selected during training, the model correctly identified 75 % of Gram-positive cases and 74 % of Gram-negative cases; post-test probability was strongest for Gram-negative infection, with a positive-predictive value of 0.87. Permutation-based feature-importance analysis singled out procalcitonin, erythrocyte sedimentation rate, and alkaline phosphatase as the highest-impact covariates, consistent with known pathogen-specific inflammatory profiles. These results suggest that, once bacteremia is flagged, the model provides clinically actionable guidance on likely organism polarity, offering a biologically plausible path to more precise empiric antibiotic selection.